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ANOTHER CONJUGATE GRADIENT ALGORITHM 
FOR UNCONSTRAINED OPTIMIZATION 


Neculai ANDREI 


Rezumat. Se propune si se analizeaza un alt algoritm hibrid de gradient conjugat. 
Parametrul $, se calculeazad ca,o-combinatie convexa.a lui 8! (Hestenes-Stiefel) si 
po (Dai- Yuan), adica: e* =(1-6,)£;* +0,B?". Parametrul @, se calculeaza astfel 
incat directia corespunzatoare acestui algoritm hibrid de gradient conjugat sa fie egala 
cu directia. Newton. Algoritmul utilizeaza conditiile de cautare liniara Wolfe. 
Comparatiile numerice efectuate pe un tren de 750 de functii de test, cateva dintre 
acestea fiind din biblioteca CUTE, arata ca aceasta schema computationala 
surclaseaza algoritmii de gradient conjugat Hestenes-Stiefel si Dai-Yuan, precum si alti 
algoritmi de gradient conjugat. 


Abstract. Another hybrid conjugate gradient algorithm is proposed and analyzed. The 
parameter 3, is computed as a convex combination of f3/° corresponding to Hestenes- 
Stiefel and PY of — Dai-Yuan ~~ conjugate gradient algorithms, i.e. 
Bo =(1-4,) 8° + 6,80". The parameter 6, is computed in such a way that the 


direction corresponding to the conjugate gradient algorithm is equating the Newton 
direction. The algorithm uses the standard Wolfe line search conditions. Numerical 
comparisons with conjugate gradient algorithms using a set of 750 unconstrained 
optimization problems, some of them from the CUTE library, show that this hybrid 
computational scheme outperforms the Hestenes-Stiefel and the Dai-Yuan conjugate 
gradient algorithms, as well as some other known conjugate gradient algorithms. 


Keywords: unconstrained optimization, hybrid conjugate gradient method, Newton direction, 
conjugacy condition, numerical comparisons 


Introduction. 


For solving the nonlinear unconstrained optimization problem 

min{ f(x):xeR"}, (1) 
where f :R" > R is a continuously differentiable function, bounded from below, 
starting from an initial guess x), ¢R", a nonlinear conjugate gradient method, 
generates a sequence {a} as: 


X41 =X, +,d,, (2) 
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where @, > 0 is obtained by line search, and the directions d, are generated as: 

Bint = Brut BSe, Fy =—8o- (3) 
In (3) ££, is known as the conjugate gradient parameter, s, =x,,,—, 
and g, = Vf (x,). Consider ||| the Euclidean norm and define y, = g,,,—g,- The 
line search in the conjugate gradient algorithms often is based on the standard 
Wolfe conditions: 

f pC OAT, POR (4) 

Bind, 208, 4, , (5) 

where d, is a descent direction and 0<p<o<l. Plenty of conjugate gradient 
methods are known, and an excellent survey of these methods, with a special 
attention on their global convergence, is given by Hager and Zhang [17]. Different 
conjugate gradient algorithms correspond to different choices for the scalar 
parameter /,. Methods Fletcher and Reeves (FR) [14], Dai and Yuan (DY) [11] 
and Conjugate Descent (CD) proposed by Fletcher [13]: 


FR _ eee DY _ Ie CD _ Sraecn 
te T 7 2 i ? my i 
§x Sx Ve Sk TEx SK 
have strong convergence properties, but they may have modest practical 
performance due to jamming. On the other hand, the methods of Polak — Ribiére 
[20] and Polyak (PRP) [21], Hestenes and Stiefel (HS) [18] or Liu and Storey 
(LS) [19]: 


o) 


T it T 

PRP _ Ski dk HS _ Sx Ne ts _ Sri Yk 

k me |i 2 a. Tr ° ic” = T°? 
Ex x VS TSE Sx 


in general may not be convergent, but they often have better computational 
performances. In order to exploit the attractive features of each set, the so called 
hybrid conjugate gradient methods have been proposed. The known hybrid 
conjugate gradient methods, summarized in [5], combine in a projective manner 
the above conjugate gradient methods. In this paper we suggest another approach 
based on a convex combination of conjugate gradient algorithms. 


The hybrid conjugate gradient algorithm as a convex combination of HS and 
DY algorithms. 


Our algorithm generates the iterates x,,x,,x,,... computed by means of the 


recurrence (2), where the stepsize a, >0 is determined according to the Wolfe 


conditions (4) and (5), and the directions d, are generated by the rule: 
dpi =a +B, Ses Gy = 8p: (6) 
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where 


T T 
By = (1-6, ee +6,B." = (I G.) Sit +6, efesttanl (7) 


T 
Ve SK Vi Sk 


and @, is a scalar parameter satisfying 0 < 6, <1, which follows to be determined. 
Observe that if 6,=0, then BF = 8%", and if 0, =1, then Bf =P". On the 
other hand, if 0< 6, <1, then ° is aconvex. combination of 2,* and 82". 


The HS method has the property that the conjugacy condition y/d,,,=0 always 


holds, independent of the line search. With an exact line search £3,” = £/*" 


Therefore, the convergence properties of the HS methods are similar to the 
convergence properties of the PRP method. 


As a consequence, by Powell’s example [22], the HS method with the exact line 
search, for general nonlinear functions, may not converge. The HS method has a 
built-in restart feature that addresses directly to the jamming phenomenon. Indeed, 
when the step x,,,—x, is small, then the factor y, = g,,,—g, in the numerator of 
7S tends to zero. Hence, 3° becomes small and the new direction d,,, is 
essentially the steepest descent direction —g,,,. The performance of HS method is 
better than the performance of DY [17]. 
The DY method, on the other side, always generates descent directions, and in [8] 
Dai established a remarkable property for the DY conjugate gradient algorithm, 
relating the descent directions to the sufficient descent condition. It is shown that 


if there exist constants vy, and y, such that vy, < lz A <y, for all k, then for any 

p € (0,1), there exists a constant c>0such that the sufficient descent condition 
2 

g)d,< ~c| g,|| holds for at least | pk| indices i € [0,k], where | j| denotes the 

largest integer < /. 


From (6) and (7) it is easy to see that 


vi T 
y & a8 8 c+ 8 c+ 
Orn= 617 46-6, sy \ ee (8) 
Sk Sk 
In our algorithm the parameter 6, is selected in such a manner that the direction 
d,,, given by (8) is the Newton direction dj, =—-V’ f(x,,,) |8;,,- Therefore, 
from the equation 
Ee T 
r = Be &§ + 8x4 g 4s 
—V’ f (a1) Bis = Sku +(1-6,) “Pp s, +6, a 


kk Kk 


having in view that V* f(x,,,)s, = y,, after some algebra, we get 
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T 
P) — Se 8k 
) = EO 


: (9) 
Fey 


Theorem 1. 

Assume that d, is a descent direction and a, in algorithm (2) and (8) 
where @, is given by (9) is determined by the Wolfe line search (4) and (5). If 
0<@, <1, and 


2 


(9, Sea MS, Sen) 
k Ok = kOk-+1 ‘ (10) 


k'k 
then the direction d,,, given by (8) is a descent direction. 
Proof: 
From (8) and (9) we get 


z 2 ix T is 

Ss 2 us, + s + 

ode = hs i Best) ia Sic &k mt Ape) c at ‘| (11) 
(85 Sia VS) 


Nels 


< |e 


Since s/g, <0, it follows that 7 2... =V,5, + 5,8; <i Sps Le. 


cae 
yee 
On the other hand, 0 < 6, <1, hence 


2ih (12) 


Seopa 
& Siku 
Therefore, from (10) we have 


T (sng Sp. | 2 | 2 
% AGP), S—| 1+ ya eel + ———— | oy 
Ce (8, Si OS) | ‘ y ge | j ! 


T T 
< S Gc can beeeeel| gal’ <0: (14) 


T T 
Si Seu JL Ve Se 
proving that the direction d,,, is a descent direction. 


O<1+ <l. (13) 


Theorem 2. 
Assume that the conditions in Theorem I hold. If there exists a constant 

c, > 0, such that 0<c, < @, <1, then there exists a constant 6 >0 such that 

2 


Lr 
Sin din S —6 8.41 > (15) 
i.e. the direction d,,, satisfies the sufficient descent condition. 
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Proof. 
From (14) we have 


1 T 
S, 8x4 5,8 2 
shad S| Ae | 8 Ila (16 
Since y/s, >0 and s/g, <0, it follows that there exists a constant c, >0, 
such that g/s, <—c,(y{s,)<0. On the other hand, since 1>0, >c,>0, then 
Si. Byuy S—C(B; 8,4). Therefore, from (16) we have 
T v 
5S, Six S, 8. 2 2‘ 
Sind s+| - ‘ a a ‘ Jes S—€,C, lel =SOlg,..1 


where 6 =c,c, >0. = 


2 


> 


The parameter 6, given by (9) can be outside the interval [0,1]. However, in 
order to have a real convex combination in (7) the following rule is considered: if 
0. <0, then set 0, =0 in (7), i.e. Br = BY”; if 0, =1, then take 0, =1 in (7), ie. 
Be = Be". Therefore, under this rule for @, selection, the direction d,,, in (8) 
combines in a convex manner the HS and DY algorithms. 


The NDHSDY algorithm 


Step 1. Initialization. Select x, €R" and the parameters 0< p<o<1. Compute 
f(x) and gy. Consider d, =—g, and set a =1/||g,|. 

Step 2. Test for continuation of iterations. If | g, I. <10°, then stop. 

Step 3. Line search. Compute a, >0 satisfying the Wolfe line search condition 
(4) and (5) and update the variables x,,, =x, +a,d,. Compute f(x,,,), g,,, and 
Si = Xia ~ Meo Ve = Bist ~ 8x: 

Step 4. 0. parameter computation. If g/g,,,=0, then set 0,=0, otherwise 
compute @, as in (9). 

Step 5. B° conjugate gradient parameter computation. If 0<@, <1, then 
compute #, as in(7). If 6, >1, then set BE = 8)". If 6. <0, then set BE = GB)”. 
Step 6. Direction computation. Compute d =~—g,,,+,s, . If the restart criterion 


of Powell 


2 


gin Be| 20-28. > (17) 
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is satisfied, then restart, i.e. set d,,, =—g,,, otherwise define d,,, =d . Compute 


the initial guess @, =a@,_, la. | / ld, , set k=k-+1 and continue with step 2. 

It is well known that if / is bounded along the direction d, then there exists a 
stepsize a, satisfying the Wolfe line search conditions (4) and (5). In our 
algorithm when the Powell restart condition is satisfied, then we restart the 
algorithm with the negative gradient —g,,,. Under reasonable assumptions, 
conditions (4), (5) and (17). are sufficient to prove the global convergence of the 
algorithm. 

The first trial of the step length crucially affects the practical behavior of the 
algorithm. At every iteration k 21 the starting guess for the step a, in the line 
search is computed as a, sd, /|4,|,. This selection was considered for the 


first time by Shanno and Phua in CONMIN [23]. It is also considered in the 
packages: SCG by Birgin and Martinez [6] and in SCALCG by Andrei [2,3,4]. 


Convergence analysis. 


Assume that: 

(i) The level set S= {x ER": f(x)< f@)} is bounded. 

(ii) In a neighborhood N_ of S, the function f is continuously differentiable 
and its gradient is Lipschitz continuous, i.e. there exists a constant L>0 
such that VF (x) —Vf (y)| a Llx- y||, forall x,yEN. 

Under these assumptions on f, there exists a constant [>0 such that 

VFO) <I, forall xeS. 


In [10] it is proved that for any conjugate gradient method with strong Wolfe line 
search the following general result holds: 


Lemma 1. 
Suppose that the assumptions (i) and (ti) hold and consider any conjugate 
gradient method (2) and (3), where d, is a descent direction and a, is obtained 


by the strong Wolfe line search 


f (%, + OD mnlalede OT, A, (18) 
lgind|S oar. (19) 
If 
1 
>> =, (20) 
kel l@, | 


then 
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liminf ||, | =0. (21) 
k->oo 
For uniformly convex functions which satisfy the above assumptions we can 


prove that the norm of d,,, generated by (8) and (9) is bounded above. Thus, by 
Lemma | we have the following result. 


Theorem 3. 

Suppose that the assumptions (i) and (ii) hold. Consider the algorithm (2) 
and (8)-(9), where d,., isa descent direction and a, is obtained by the strong 
Wolfe line search (18) and (19). If for k20, 0<@ <1 and there exists the 


nonnegative constant 7, such that 


|e I’ <1 (22) 


and the function f is a uniformly convex function, i.e. there exists a constant 
2=0 such that for all x,y eS 


(23) 
then 
lim g, = 0. (24) 
Proof. 
From (23) it follows that y/s, > ul\s,|/. Now, since 0<6, <1, from 


uniform convexity and (22) we have: 


\ac |< Peo Vig ei 4 | Sk S ks 818i < [sell , “i [se ; (25) 
Vp Sx Vy Si uls,| Ls; | 
But ly S 
Ipc Rete ui 
ays isi] 


Hence, with (25) we have 
l@eall|<ecall+ A | [s+ 


which implies that (20) is true. 

Therefore, by Lemma 1 we have (21), which for uniformly convex functions is 
equivalent to (24). 

For general nonlinear functions the convergence analysis of our algorithm exploits 
insights developed by Gilbert and Nocedal [15], Dai and Liao [9] and that of 
Hager and Zhang [16]. Global convergence proof of NDHSDY algorithm is based 
on the Zoutendijk condition combined with the analysis showing that the 


Sly 
ai 
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sufficient descent condition holds and |d,|| is bounded. Suppose that the level set 
S is bounded and the function f is bounded from below. Additionally, assume 


that there exists a constant y > 0, such that y < ! &, |. 


Theorem 4. 

Suppose that the assumptions (i) and (ii) hold and for every k =0 there 
exist the constants..n20 and @=20 such that:  |\g,,,|<7]|]s,|| and 
|ecul| Sollee AlselP- If d, is a descent direction and Vf(x) is a Lipschitz 
function on §,then for the computational scheme (2) and (8)-(9), where 
0<c, <0 <1 and a, determined by the Wolfe line search (4)-(5) is bounded, 
either g, =0 for some k or 

liminf |g, | =0. (26) 
k— oo 


Proof. 
Since 0<@, <1 we can write 


T 4 fl 
Let amos K [sia [lo [Fecal]: r? 
VeSe Vee Sk lye se| 


By the Wolfe condition (5) we have: 


+ 


Ar |< 


Ye 5; = (Gea — 8x) 5, 2(F DBA FG o) 2; 5, . 

On the other hand, since 0<c, <0, <1, then from theorem 2 there exists 
the constant 6>0 such that, g7s,<—6||g,||.Therefore, y/s, >(l—o)é||g,|\. 
Hence, 

[ees x eae | @ 1 


< < . 
YS,  -oysllg,|- A-6 J sgl 


On the other hand, from Lipschitz continuity we have 
I>. | = legen — 2, | < Ls, . With these, from (27) we get 


oO 1 
——| Llls,]]+7|I\s,]] |= 
Tae ppl 


OL+n) 1 
(06 [xT 


|< (28) 


Now, we can write 
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ao(L+n) 
(l-o)d 

Since the level set § is bounded and the function f is bounded from 
below, from (4) it follows that 


laa lgeoil+ [Ae bse s+ (29) 


0 (gid we 
0<>) (2; ; (30) 
|, 


i.e. the Zoutendijk condition holds. Therefore, the descent property 
Da. 
815, S-O||g, || yields: 


me 


2, 


g. |. Ole 1 (Si Su) 


sO sf 


which contradicts (29). Hence, vy =liminf | gy | =0. 
k— oo 


Numerical experiments. 


In this section we present the computational performance of a Fortran 
implementation of the NDHSDY algorithm on a set of 750 unconstrained 
optimization test problems. 


The test problems are the unconstrained problems in the CUTE [7] library, along 
with other large-scale optimization problems presented in [1]. We selected 75 
large-scale unconstrained optimization problems in extended or generalized form. 
Each problem is tested 10 times for a gradually increasing number of variables: 
n=1000,2000,...,10000. At the same time we present comparisons with other 
conjugate gradient algorithms, including the performance profiles of Dolan and 
Moré [12]. 

All algorithms implement the Wolfe line search conditions with p=0.0001 
ando=0.9, and the same stopping criterion le le <10°, where ||| is the 
maximum absolute component of a vector. The comparisons of algorithms are 
given in the following context. Let f°" and f,“"* be the optimal value found 
by ALGI and ALG2, for problem i=1,...,750, respectively. 


We say that, in the particular problem i, the performance of ALG1 was 
better than the performance of ALG? if: 


7. =f "| < 10° (31) 
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and the number of iterations, or the number of function-gradient evaluations, or 
the CPU time of ALG1 was less than the number of iterations, or the number of 
function-gradient evaluations, or the CPU time corresponding to ALG2, 
respectively. 


In this numerical study we declare that a method solved a particular problem if the 
final point obtained has the lowest functional value among the tested methods (up 


to 10° tolerance as it is specified in (31)): 


Clearly, this criterion is acceptable for users that are interested in minimizing 
functions and not finding critical points. 


All codes are written in double precision Fortran and compiled with f77 (default 
compiler settings) on an Intel Pentium 4, 1.8GHz workstation. All these codes are 
authored by Andrei. 


In the first set of numerical experiments we compare the performance of 
NDHSDY to the HS and DY conjugate gradient algorithms. Figure | presents the 
Dolan and Moré CPU performance profiles of NDHSDY versus HS and DY, 
respectively. 


NDHSDY NDHSDY 


Hestenes-Stiefel (HS) O7F Dai-Yuan (DY) 


NDHSDY HS = 
#iter 277 244 183 
#fg 283 316 105 | 
cpu 413 203 88 


NDHSDY DY = 
#iter = 435 79 «6188 J 
#fg 396 §9192 114 
cpu 521 113 68 | 


CPU time metric, 704 problems 


CPU time metric, 702 problems 


1 1 1 
0 2 4 6 8 10 12 14 16 


0 2 4 6 8 10 12 14 16 02 


Fig. 1. Performance based on CPU time. NDHSDY versus HS and DY. 


When comparing NDHSDY to HS, subject to the number of iterations, we see that 
NDHSDY was better in 277 problems (i.e. it achieved the minimum number of 
iterations in 277 problems), HS was better in 244 problems and they achieved the 
same number of iterations in 183 problems etc. 


Out of 750 problems, only for 704 problems the criterion (31) holds. Similarly, we 
see the number of problems for which NDHSDY was better than DY. Observe 
that the convex combination of HS and DY, expressed as in (7), is far more 
successful than HS or DY algorithms. 
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Figure 2 presents the performance profiles of NDHSDY versus the conjugate 
gradient algorithms: PRP, PRP+, LS and CD. It seems that the best algorithm is 
the hybrid algorithm NDHSDY given by a convex combination of HS and DY, 
where the parameter in the convex combination is obtained using the Newton 
direction. 


< 1 NDHSDY 
08 | t x 


07 Polak-Ribiere-Polyak (PRP) 7] Mm Polak-Ribiere-Polyak + (PRP+) 
06 NDHSDY PRP = 1 a NDHSDY PRP+ = 
#iter 321 187 202 #iter 284 176 256 
05 #fg 338 242 #130 «4 05; #fg «285 Ss 254177 
cpu 469 154 87 cpu 461 173 82 
04 4 04 
CPU time metric, 710 problems CPU time metric, 716 problems 


0.9; 


NDHSDY 
oe Conjugate Descent (CD) 


07 Liu-Storey (LS) 4 0.6} 


NDHSDY CD = 
NDHSDY LS = ost #iter 487 34 = 187 
06 #iter 283 216 189 | #ig 437 116 119 
#fg 299 275 114 o4t cpu 558 88 62 


a cpu 409 196 83 


CPU time metric, 688 problems CPU time metric, 708 problems 


02 L L 1 1 
0 2 4 6 8 10 12 14 16 


0 2 4 6 8 10 12 14 16 


Fig. 2. Performance profiles of NDHSDY versus some conjugate gradient algorithms. 


Observe that the NDHSDY algorithm is top performer. Since these codes use the 
same Wolfe line search and the same stopping criterion they differ in their choice 
of the search direction. 

Hence, among these hybrid conjugate gradient algorithms we considered here, 
NDHSDY appears to generate the best search direction. 

Also, the algorithm has better performance profiles than those corresponding to 
HS and DY. In this numerical study we noticed that for most of the iterations the 


NDHSDY algorithm uses (, . 
Referring to the condition (10) we noticed that (y; 2,.,)(5,,,)/,5, tends to 


zero faster than| Pea I ; 
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For most of the iterations the condition (10) is satisfied, i.e. the algorithm has a 
self-adjusting property in the sense given in [8]. It is worth saying that the 


condition (10) is more satisfied after those iterations in which /, is computed 
according to the HS or DY rules. 

Introducing (10) as a restart criterion, does not improve the performances of the 
algorithm. 


On the other hand, the conditions |g,,,||<7||s,|| and leeu)|< ole. /se | from 
theorem 4 say that | Sal <on } Be I. We noticed that there exists a k, such that 


for any iteration k2k, the above condition Igual) <n’ lg, |) is satisfied, 


illustrating the global convergence. 
Conclusion. 


We know a large variety of conjugate gradient algorithms. In this paper we have 
presented a new hybrid conjugate gradient algorithm in which the famous 


parameter 7, is computed as a convex combination of 6;” and £”. 


For uniformly convex functions if the gradient is bounded in the sense that 
} g, | ff, ls. and the line search satisfy the strong Wolfe conditions, then our 
hybrid conjugate gradient algorithm is globally convergent. 


For general nonlinear functions if the parameter 0, from , definition is 


bounded, and both |\g,,,||<7l|s,|| and ||z,4||<@llg,| /||y, || are satisfied, where 7 


and @ are nonnegative constants, then our hybrid conjugate gradient is globally 
convergent. 


The performance profile of our algorithm was higher than those of the well 
established conjugate gradient algorithms for a set consisting of 750 
unconstrained optimization problems some of them from CUTE library and some 
others we presented in [1]. 
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